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BAYESIAN ANALYSIS FOR REVERSIBLE MARKOV CHAINS 
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Stanford University and Eindhoven University of Technology 

We introduce a natural conjugate prior for the transition matrix 
of a reversible Markov chain. This allows estimation and testing. The 
prior arises from random walk with reinforcement in the same way the 
Dirichlet prior arises from Polya's urn. We give closed form normaliz- 
ing constants, a simple method of simulation from the posterior and 
a characterization along the lines of W. E. Johnson's characterization 
of the Dirichlet prior. 

1. Introduction. Modeling with Markov chains is an important part 
of time series analysis, genomics and many other applications. Reversible 
Markov chains are a mainstay of computational statistics through the Gibbs 
sampler, Metropolis algorithm and their many variants. Reversible chains 
are widely used natural models in physics and chemistry where reversibility 
(often called detailed balance) is a stochastic analog of the time reversibility 
of Newtonian mechanics. 

This paper develops tools for a Bayesian analysis of the transition proba- 
bilities, stationary distribution and future prediction of a reversible Markov 
chain. We observe Xq = vq , X\ = v\,. . . , X n = v n from a reversible Markov 
chain with a finite state space V. Neither the stationary distribution v(v) 
nor the transition kernel k(v,v') is assumed known. Reversibility entails 
v(v)k(v,v') = v(v')k(v', v ) for all v,v' S V. We also assume we know which 
transitions are possible [for which v,v' £ V is k(v,v') > 0]. 

In Section 2 we introduce a family of natural conjugate priors. These are 
defined via closed form densities and by a generalization of Polya's urn to 
random walk with reinforcement on a graph. The density gives normalizing 
constants needed for testing independence versus reversibility or reversibil- 
ity versus a full Markovian specification. The random walk gives a simple 
method of simulating from the posterior (Section 4.5). 
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Properties of the prior are developed in Section 4. The family is closed 
under sampling (Proposition 4.1). Mixtures of our conjugates are shown to 
be dense (Proposition 4.5). A characterization of the priors via predictive 
properties of the posterior is given (Section 4.2). 

A practical example is given in Section 5. Several simple hypotheses are 
tested for a data set arising from the DNA of the human HLA-B gene. Sec- 
tion 5 also contains remarks about statistical analysis for reversible chains. 

2. A class of prior distributions. We observe Xq = vq, X\ = v±,..., 
X n = v n from a reversible Markov chain with a finite state space V and 
unknown transition kernel &(•,•)• 

Let G = (V, E) be the finite graph with vertex set V and edge set E 
defined as follows: e = {v,v'} G E (i.e., there is an edge between v and v') 
if and only if k(v,v') > 0. We assume that k(v,v') > iff k(v',v) > 0. In 
particular, all edges of G are undirected and an edge is denoted by the set 
of its endpoints. For some vertices v, we may have k(v,v) > 0. Define the 
simplex 




= (x e ) e g£G (0,l] E :]T> e = l 



Remark 2.1. The distribution of a reversible Markov chain can be 
described by putting on the edge between v and v' the weight xr vv >\ := 
i/(v)k(v,v') = v(v')k(v', v). If the weights are normalized so that J2eeE x e = 
1, this is a unique way to describe the distribution of the Markov chain. A 
transition from v to v' is made with probability proportional to the weight 

Denote by Q vo , x the distribution of the Markov chain induced by the 
weights x = {x e ) e £E G A which starts with probability 1 in vq. Using this 
notation, our assumption says that the observed data comes from a distri- 
bution in the class 

(2) Q := {Q vo , x :vq eV,x£ A}. 

2.1. A minimal sufficient statistic. If the endpoints of an edge e agree, 
we call e a loop. Let 

(3) E\ oop := {e S E : e is a loop}. 

For an edge e, denote the set of its endpoints by e. For x = (x e ) e& E S (0, oo)^ 
and a vertex v, define x v to be the sum of all components x e with e incident 
to v, 

(4) x v . — ^ ' x e . 

{e: v€e} 
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In sums such as this the sum is over edges including loops. 
Let 7r := (ttq, tti, . . . , 7r n ) be an admissible path in G. Define 

(5) k v (ir) := \ {i £ {1, 2, . . . , n} : (i>, 7Tj) = (7Tj_i, 7Tj)}| for tie V, 

(ci\ t. f„\ ._ / l{ f G I 1 ; 2 ,--- ,w}:{7ri_i,7Ti} = e}|, for e £ E\ E\ oop , 

{ ) W '~\2.|{i€{l,2 ) ...,n}:{ 7 r i _ 1 ,7r i } = e}|, for e G £, oop . 

That is, k v (ir) equals the number of times the path 7r leaves vertex u; for an 
edge e which is not a loop, & e (7r) is the number of traversals of e by tt, and for 
a loop e, fc e (7r) is twice the number of traversals of e. Recall that the edges 
are undirected; hence, k e (n) counts the traversals of e in both directions. Set 

(7) Z n := (Xq, X\, . . . , X n ). 



Proposition 2.2. The vector of transition counts (fc e (Z n )) eg £ is a min- 
imal sufficient statistic for the model Q Vo := {Q Vo , x '■ % G A}. 

Proof. Let tt be an admissible path in G. In order to prove that 
(k e (Z n )) eeE is a sufficient statistic, we need to show that 

(8) Q V0)X (Z n = ir\(k e (Z n )) eeE ) 

does not depend on x. If ir does not start in v , (8) equals zero. Otherwise, 
we have 

(9) QvoA z n = V = kjy) • 

livev x v 

It is not hard to see that k v (jr) can be expressed in terms of the k e (ir) 
and the first observation vq. Hence, the Q vo ^-probability of tt depends only 
on fc e (7r), e G E, and vq. Thus, (8) equals one divided by the number of 
admissible paths tt' with starting point vq and k e {Tx') = k e (Tr) for all e G E, 
which is independent of x. 

Suppose K := (k e ) e ^E is not minimal. Then there exists a sufficient statis- 
tic K' which needs less information than K. Consequently, there exist two 
admissible paths tt and tt' starting in vq such that K{tt) ^ K(tt') and K'(ir) = 
K'(ir'). Then 

Q V0 , x (Zn = K\K'(Z n )=K'(TT)) 

(10) 



Q V0tX (Z n = 7T'\K'(Z n )=K'(7T')) 
QvQ, X {Zn — I") 



Qvo,x{Zn 



e£E\E loop ee-Bi oop v&V 
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Fig. 1. The triangle. 

Since by assumption (k e (ir)) ee E / (k e {^'))eeE, the last quantity depends on 
x. This contradicts the fact that K' is a sufficient statistic. □ 

2.2. Definition of the prior densities. Our aim is to define a class of prior 
distributions in terms of measures on A. We prepare the definition with some 
notation. We illustrate the definitions by considering a three state process 
with states {1,2,3}, all transitions possible, but no holding. This leads to 
the graph in Figure 1. 

Denote the cardinality of a set S by \S\. Recall the definition (3) of the 
set .Eioop- Set 

(11) l:= \V\ + | Sioop | and m:=\E\. 
For the three state example, m = I = 3. 

Remark 2.3. There is a simple way to delineate a generating set of 
cycles of G. We call a maximal subgraph of G which contains all loops 
but no cycle a spanning tree of G. Choose a spanning tree T. Each edge 
e £ E \ E\ oop which is not in T forms a cycle c e when added to T. (By 
definition, a loop is never a cycle and never contained in a cycle.) There are 
m — l + 1 such cycles and we enumerate them arbitrarily: c\ , . . . , c m _j+i . This 
set of cycles forms an additive basis for the homology H\ and also serves for 
our purposes. 

For the three state example, we may choose T to have edges {1,2} and 
{1,3}. Then there is one cycle c\ oriented (say) 1 — > 2 — > 3 — ► 1. In Section 
3.4 we show how such a basis of cycles can be obtained for the complete 
graph. 

In general, the first Betti number (3\ is the dimension of H\. For the 
complete graph, (3\{K n ) = ("2 )■ Further details can be found in [8], Section 
1.16. 

Definition 2.4. Orient the cycles ci, . . . , c m _/ + i and all edges e e E in 
an arbitrary way. For every x € A, define a matrix A(x) = (Ai i j(x))i<i.j< m -i+i 
by 

(12) A hl (x) = J2—, Aj( x )= E ±— for ^i> 

eGCi Xe eecidCj Xe 
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where the signs in the last sum are chosen to be +1 or —1 depending on 
whether the edge e has in Cj and Cj the same orientation or not. 

In the three state example, the matrix A(x) is 1 x 1 with entry x7^ 2 y + 
X {2,3} + x {i,3}- 

Recall the definition (4) of x v . Similarly, define a v for a := (a e ) es =£ G 
(0,oo) £ . The main definition of this section (the conjugate prior) follows. 

Definition 2.5. For all v G V and a := (a e ) eg £ G (C^cx))^, define 

-pr ae-l/2-pr (a e /2)-l 

(13) <f> vo , a (x):=Z VOja Vdet^Or)) 

x vo llveV\{v } x v 

for x := (x e ) eg £; G A with 

Ue€E T ( a e) 



vo ' a T(a V0 /2) U v ev\{v } r(K + l)/2) n e ^ loop r((a e + l)/2) 
(14) 

(m-l)!^" 1 )/ 2 

X 

2 1 ~ l+ Y,eeE a e 

For the three state example with parameters X; 1)2 ) = £{2,3} = y, £{1,3} 
z i a {i,2} = O) a {2,3} = b, a{i,3} = c, and t>o = 1, the prior density (ft is 



^1/2^1/2^0-1/2 /I ,1,1 

1 j (x + z)M/ 2 (x + 2 /)( fl +W)/ 2 (i/ + ^( , '+ c + 1 )/ 2 Y x y z' 

with the normalizing constant Z given by 

r(o)r(6)r(c) 2tt 



(16) 



r((a + c)/2)r((a + 6 + l)/2)r((6 + c + l)/2) 2«+ 6 + 



c-2 ' 



A derivation of the formula for the density in this special case can be found, 
for example, in [12]. The density for the triangle with loops is given in (31). 

The following proposition shows that the definition of 4> Vo ,a is independent 
of the choice of cycles Cj used in the definition of A(x). 

Proposition 2.6. For the matrix A of Definition 2A, with T the set 
of spanning trees of G, 

(17) detA(x)=Y: n ^ 

TeTe<£E(T) e 



G 
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Proof. This identity is proved for graphs without loops in [14], page 
145, Theorem 3'. By definition, A(x) does not depend on x e , e G E\ oop . 
Furthermore, since every spanning tree contains all loops, the right-hand 
side of (17) does not depend on x e , e G E\ oop either. In particular, both sides 
of (17) are the same for G and the graph obtained from G by removing all 
loops; hence, they are equal. □ 

The prior density (j> Vo a arises in a natural extension of Polya's urn. We 
treat this topic next. 

2.3. Random walk with reinforcement. Let a denote the Lebesgue mea- 
sure on A, normalized such that cr(A) = 1. The measures (f) VoA do~ on A arise 
in the study of edge-reinforced random walk, as was observed by Copper- 
smith and Diaconis; see [3]. Let us explain this connection: 

Definition 2.7. All edges of G are given a strictly positive weight; 
at time edge e has weight a e > 0. An edge-reinforced random walk on 
G with starting point vq is defined as follows: The process starts at vq at 
time 0. In each step, the random walker traverses an edge with probability 
proportional to its weight. Each time an edge e G E \ E\ oop is traversed, its 
weight is increased by 1. Each time a loop e G i^ioop is traversed, its weight 
is increased by 2. 

Denote the set of nonnegative integers by No- Let be the set of all 
K)ieN G such that {vi,v i+1 } G E for all i G N . Let X n : V N ° -> V de- 
note the projection onto the nth coordinate. Recall that Z n = (Xq,Xi, . . . , X n ). 
Denote by P vo , a the distribution on Q of an edge-reinforced random walk 
with starting point vq and initial edge weights a = (a e ) ee £. 

Remark 2.8. Let a e (Z n ) := k e (Z n )/n be the proportion of traversals 
of edge e up to time n. For a finite graph without loops, it was observed 
by Coppersmith and Diaconis that a(Z n ) := (a e (Z n )) ee E converges almost 
surely to a random variable with distribution (p VOja da; see [3] and also [13]. 
In particular, 4> Vo , a do~ is a probability measure on A. This fact is not at all 
obvious from the definition of 4> V0) a- 

It turns out that an edge-reinforced random walk on G is a mixture of 
reversible Markov chains, where the mixing measure described as a measure 
on edge weights (x e ) e< =E is given by (p VOta da. This is made precise by the 
following theorem. 
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Fig. 2. Transformation of loops. 



Theorem 2.9. Let (X n ) n& fq be an edge-reinforced random walk with 
initial weights a = (a e )eG-B starting at vq, and let Z n = (Xq,Xi, . . . ,X n ). 
For any admissible path ir = (vq, . . . ,v n ), the following holds: 



(18) 



Pvo,a(Z n — T*") 



n 



(pv ,a(x)da(x); 



here x := (x e ) ee £. Hence, ifQ V0 , a is the mixture of Markov chains where the 
mixing measure, described as a measure on edge weights (x e ) ee £, is given 
by <f> Vo a da, then 



(19) 



Proof. If G has no loops, then the claim is true by Theorem 3.1 of [16]. 

Let G be a graph with loops. Define a graph G' := (V',E') as follows: 
Replace every loop of G by an edge of degree 1 incident to the same vertex 
(see Figure 2). More precisely, for all e G £a opi let v(e) be the vertex e is 
incident to and let v '(e) be an additional vertex, different from all the others. 
Then, set g(e) := {v (e), v'(e)} and 



(20) 
(21) 



V':=VU{v'(e):eeE loop }, 

E' :=[E\E loop ]U{g(e):eeE loop }. 



The graph G' has no loops and the claim of the theorem is true for G' . 

Let P' Vo b be the distribution of a reinforced random walk on G' starting 
at vq with initial weights b = (fr e ')e'e-E' defined by 



(22) 



b P ' 



a P i, 



if e'e E\E loop , 

if e' = g(e) for some e € -Eioop- 



Any finite admissible path ir = (ttq = vq, tt\, . . . , ir n ) in G can be mapped 
to an admissible path n' = (tt' = vq, tt^, . . . , 7r' n ,) in G' by mapping every 
traversal of a loop e G E\ oop in it to a traversal of (v(e),v'(e),v(e)) in n' [i.e., 



<s 
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a traversal of the edge g(e) back and forth in tt']. The probability that the 
reinforced random walk on G traverses tt agrees with the probability that 
the reinforced random walk on G' traverses tt' . [Note that for G and G' the 
following is true: Between any two successive visits to v (e), the sum of the 
weights of all edges incident to v{e) increases by 2.] Since the claim of the 
theorem is true for G' , it follows that 

P V0 ,a(Zn=7T)=P^ b (Z n/ =7r') 

(23) 

JA i=i X K- 1 

where 4>' vo b denotes the density corresponding to G', starting point vq and 
initial weights b. We claim that the right-hand side of (23) equals 

(24) f f\ X{ ^-^ ] ^ a {x)da{x). 

Note that a traversal of e G E\ oov contributes x e /x v i e \ to the integrand in 

(24) , whereas a traversal of (v(e),v'(e),v(e)) contributes x g t e \/x v ( e \ to the 
integrand in (23). Furthermore, e G -E^op contributes 

(25) r((o e + l)/2) 2ae (ae/2M 

T(a e ) 

to 

0«o,o) whereas the contribution of the edge g(e) and the vertex v'(e) to 
the density (j)' b equals 

r((ev ( .) + 1)/2) _ 2 „ e (x gi e } T*- 1/2 



T(a e 



K'(e)) K ^ +1)/2 



(26) _ r((q e + l)/2) ^ (x gie) r-^ 



r(Oe) (x ff(e) )^+l)/2 

•2 a -(x 9(e) )( 



r((tt e + l)/2) x(oe/2)-l 



r(ae) 

Finally, |V| + |^i 00 p| = \V\ and = \E'\. Consequently, the expression in 
(24) agrees with the right-hand side of (23) and the claim follows. □ 

3. The prior density for special graphs. In this section we write down 
the densities (j> vo>a for some special graphs. 
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3.1. The line graph (Birth and death chains). Consider the line graph 
with vertex set V = {i : < i < n} and edge set E = {{i, % + 1} : < i < n — 1}; 
see Figure 3. Given a = (a{i-i,i})i<i<n, let bi := au^i^x. The variables in the 
simplex A are denoted Z{ :=%_in. 

Recall that the density of the beta distribution with parameters b\ , 62 > 
is given by 



(27) 

Set 
(28) 



p) 



62-1 



(0<p<l). 



Pi 



Zi + Z i+ 1 



1< i < n - 1, 



and p := (pi)i<i< n -i- Clearly, pi is the probability that the Markov chain 
with edge weights Zi makes a transition to i — 1 given it is at i. If we make the 
change of variables (28) in the density 4> V o,a, then we obtain the transformed 
density 4> vo ,a(p) § iven b y 



f n-l 

Up 

1=1 

vo—1 

Up 



0O,o (P) 



6» + l 6 i+ i 



i=i 



n-l 
l. i=l 



2 7 2 

6i fai-j-i + 1 
2 ' 2 



(Pi 



(Pi 



/3 



n-l 

U p 

-i=vo+l 

bi b i+1 + 1 
2 ' 2 



+ 1 fcj+i 
2 ' 2 

(Pi)i 



(Pi) 



if v = 0, 

(Pv ) 

if «ftG{l,2,.. 
if vo = n; 



1}, 



here the empty product is defined to be 1. 

With the change of variables (28), the conjugate prior can be described 
as a product of independent beta variables with carefully linked parameters. 
If loops are allowed, the edge weights are independent Dirichlet by a similar 
argument (see Section 3.2). The next example contains a generalization. 



z\ 1 z-2 2 i — 1 Zj i Zj+i i + 1 n-l ^ n 
• • • • f • • f • 

Pl Pi 1 - Pi Pn- 1 



Fig. 3. The line graph. 
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3.2. Trees with loops. Recall that the density of the Dirichlet distribution 
with parameters 6j > 0, 1 < i < d, is given by 

D[bi;l<i<d](pi;l<i<d) 

(29) d d 

Let T = (V, E) be a tree. Suppose that there is a loop attached to every 
vertex, that is, {v} G E for all v G V. Let t>o G V. For every v £ V \ {vo}, 
there exists a unique shortest path from vq to v. Let e[v) be the unique 
edge incident to v which is traversed by the shortest path from vq to v. Let 
E v := {e G E : v G e} be the set of all edges incident to v. Set 

(30) p e := — for u G V, e G 

p := (p e ) eg £;, and p v := (p e )eeE v - If we make the change of variables (30) in 
the density (j) vo ,a, the transformed density <f> Va ,a{p) is given by 



D 



— , e G -E/u 



(Ao) n ^ 

vev\{v } 



a e(v) + 1 «e _ „ , f , u 



(&)• 



Thus again, in the reparametrization (30), the conjugate prior is seen as a 
product of independent random variables. This is not true in the following 
example. 

The fact that the density cj) vo >a for a tree has this particular form was first 
observed by Pemantle [15]. 

3.3. The triangle. Consider the triangle with loops attached to all ver- 
tices. Let the vertex set be V = {1,2, 3} and the edge set E = {{1}, {2}, {3}, 
{1, 2}, {1, 3}, {2, 3}} (see Figure 4). Let b{ be the initial weight of the loop 
at vertex i and let q be the initial weight of the edge opposite of vertex i. 
Similarly, let y { := x {i} and let z\ := X{ 2 ,3}, z 2 := £{1,3} > z 3 := £{i, 2 }. 

The density ^1,0(2/1,2/2,2/3,21,22,23) for 0= (6 X , b 2 , b 3 , ci, c 2 , c 3 ) is given by 

ry-\ (6i/2)-l (6 a /2)-l (6 3 /2)-l ci-1 c 2 -l c 3 -l / : : 

Z l,a -Vl V\ 2/3 Z l Z 2 Z 3 VziZ 2 + ^12 3 + Z 2 Z 3 

(31) x (( yi + z 2 + z 3 ) (bl+C2+C3)/2 (y 2 + 21 + Z3 )(^+ c i+ c 3+i)/2 

x(2/3 + 2l+2 2 ) (63+Cl+C2+1)/2 )- 1 , 

with 

z x . a = r( Cl )r( C2 )r( C 3)r(6 1 /2)r(6 2 /2)r(6 3 /2) 

(32) x / r ^i+c 2 + C3\ r ^ 2 + c 1+ C3 + l 
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To calculate Z\^ a from (14), use the identity 

ivk\ r(6 t ) _ r(bj/2) r _ 19 ,x 

^ 2^r((6i + l)/2)- 2^ ^-M,3). 

3.4. T/ie complete graph. Perhaps the most important example is where 
all transitions are possible. This involves the complete graph K n on n ver- 
tices with loops attached to all vertices. Let V = {1,2,3, . . . ,n}. Let T n be 
the spanning tree with edges and loops {i}, 1 < i < n. This spanning 

tree induces the basis of cycles given by all triangles (1,2, j), 2 < i < j < n. 
Figure 5 shows K3, K4 and K§ together with T3, T4 and T5. 

We remark that a different basis of cycles is given by + for 1 < 
i < j + 1 < n. This may be proved by induction using the Mayer- Vietoris 
decomposition theorem based on K n _\ and a point. 



12 



P. DIACONIS AND S. W. W. ROLLES 



Let a = (a{ij})i<i,j< n be given. For K n , set b { := a^y, a { = J2]=i a {i,j} 
and b := J2i<i,j< n a {i,j}- Tlie variables of the simplex are x = (x^jy) 
Abbreviating in := x^y and Xi = J2]=i x {i,j}i the density 0i ;O is given by 

n T a {i,3}- l l 2 U n (6i/2)-l 

_ x lll<i<j<n x {i,j} 1 li=l i/j 



(34) ^, ft (x)=2a- - — ai r; K+ 7 )/2 — ^t{A n ( X)) , 

x l lli=2 x i 

with ^4 n (x) defined in (12) and 

Ill<i,j<n^( a {i,j}) 



Zi, a 



r(ai/2) nr =2 r(( ai + 1)/2) nr=i im + 1)/2) 

(35) 

((n(n + l)/2) -l)!^ 1 / 2 
x 2 1_2n + b 



4. Properties of the family of priors. For v$ £ V and a = (a e ) e e£' £ 
(0, oo) s , abbreviate 

(36) P„ ,a := <t>v ,ada; 

that is, P„ 0) a is the measure on A with density 4> VoA - Recall that Q V0) a 
denotes the mixture of Markov chains where the mixing measure, described 
as a measure on edge weights (x e ) ee E, is given by P„ ,a- In this section we 
study properties of the set of prior distributions 

(37) V:={F Vo>a :voeV,a = (a e ) eeE e (0,oo) B }. 



4.1. Closure under sampling. Recall the definition (6) of k e (ir) and recall 
that Z n = (X ,...,X n ). 



Proposition 4.1. Under the prior distribution P^ 0>a with observations 
X = v , Xi = v 1 ,..., X n = v n , the posterior is given by K n ,(a e +k e (Z n )) eeE - 
In particular, the family T> is closed under sampling. 



Proof. Suppose the prior distribution is Pi> , a and we are given n + 1 
observations tt = (no,iri, . . . ,ir n ) sampled from Q vo<a - Then ttq = vq. We claim 
that the posterior is given by ^Tr n ,(a e +k e (-K)) ReE - % Theorem 2.9, Q vo , a = 
P V0) a- The P Vo ^-distribution of {-Xn+fc}fc>o given Z n = tt is the distribution 
of an edge-reinforced random walk starting at the vertex n n with initial 
values a e + k e (n). Using the identity (19) again, it follows that the P vo , a - 
distribution of {X n+k } k > given Z n = vr equals Qir n ,(a e +k e (.n)) eeE - Thus > the 
posterior equals ^TT n ,(a e +k e (-K)) eeE j which is an element of V. □ 
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4.2. Uniqueness. In this section we give a characterization of our priors 
along the lines of W. E. Johnson's characterization of the Dirichlet prior. 
See [18] for history and [19] for a version for nonreversible chains. The closely 
related topic of de Finetti's theorem for Markov chains is developed by 
Freedman [7] and Diaconis and Freedman [4]. See also [6]. 

Definition 4.2. Two finite admissible paths tt and tt' are called equiv- 
alent if they have the same starting point and satisfy fc e (vr) = k e {rr') for all 
e££. We define P to be partially exchangeable if P(Z n = tt) = P(Z n = tt') 
for any equivalent paths tt and tt' of length n. 

For n G No and v G V, define 

(38) k n (v) :=\{ie{0,l,...,n}:Xi = v}\. 

It seems natural to take a class V of distributions for (X n ) ng N with the 
following properties: 

PI. For all P G V, there exists vq£V such that P(X = v ) = 1. 

P2. For all P G V, vq as in PI, and any admissible path tt of length n > 1 

starting at vo, we have P{Z n = tt) > 0. 
P3. Every P G V is partially exchangeable. 

P4. For all PEP, v G V and e G E, there exists a function fp v e taking 
values in [0, 1] such that, for all n > 0, 

P(X n+1 =v\Z n ) = f P> x n ,{x n ,v}{k n (X n ),k {XntV} (Z n )). 

The condition P4 says that, given Xq, X\, . . . , X n , the probability that 
X n+ \ = v depends only on the following quantities: the observation X n , the 
number of times X n has been observed so far, the edge {X n ,v} and the 
number of times transitions between X n and v (and between v and X n ) 
have been observed so far. 

We make the following assumptions on the graph G: 

Gl. For all v G V, degree{v) / 2. 

G2. The graph G is 2-edge-connected, that is, removing an edge does not 
make G disconnected. 

For example, a triangle with loops or the complete graph K n , n > 4, with 
or without loops, satisfies Gl and G2, while a path fails both Gl and G2. 

Recall that Q V0)X is the distribution of the reversible Markov chain starting 
in vq, making a transition from v to v' with probability proportional to Xf v y\ 
whenever {v,v'} G E. 

Theorem 4.3. Suppose the graph G satisfies Gl and G2. 

(a) The set M := {Q VQ ,a ■ vq G V, a = (a e ) e( z E G (0, oo) E } satisfies P1-P4. 
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(b) On the other hand, if P1-P4 are satisfied for a set V of probability 
distributions, then for all P (zV there exist vq G V and a G (0, oo) E such that 
either 

P(X n+1 = v\Z n ,k n (X n )>3) 

= Qv ,a( X n+l=v\ Z n,k n (X n )>3) V?7,>0 Or 

(39) 

P(X n+1 =v\Z n ,k n {X n )>3) 

= Qv ,a( x n+i=v\Z n ,k n (X n )>3) Vra>0. 

The second part of the theorem states that either P and Qj, 0i a or P and 
Qv ,a essentially agree; only the conditional probabilities to leave from a 
state which has been visited at most twice could be different. 

Proof of Theorem 4.3. It is straightforward to check that M has 
the properties P1-P4. For the converse, let P £V. If G has no loops, then 
Theorem 1.2 of Rolles [16] implies that there exist vq G V and a G (0,oo) E 
such that either (39) holds or P{X n+ \ = v\Z n , k n (X n ) > 3) = P Vo>a (X n+ i = 
v\Z n ,k n (X n ) > 3) for all n. In this case, the claim follows from (19). 

If G has loops, consider the graph G' defined in the proof of Theorem 2.9 
and the induced process X' := (X^) n€ ^ on G' with reflection at the vertices 
v'(e), e G Sloop- The process X' satisfies P1-P4. Hence, the claim holds for 
X' and, consequently, for (X n ) n£ jq . □ 

Remark 4.4. The preceding theorem holds under the assumption that 
the graph G is 2-edge-connected (G2). If G is not 2-edge-connected, a similar 
statement can be proved for a different class of priors: One replaces the class 
T> by the mixing measures of a so-called modified edge-reinforced random 
walk] for the definition of this process, see Definition 2.1 of [16]. A uniqueness 
statement similar to Theorem 4.3 follows from Theorem 2.1 of [16]. 

4.3. The priors are dense. As shown by Dalai and Hall [2] and Diaconis 
and Ylvisaker [5] for classical exponential families, mixtures of conjugate 
priors are dense in the space of all priors. This holds for reversible Markov 
chains. 

Proposition 4.5. The set of convex combinations of priors in T> is 
weak-star dense in the set of all prior distributions on reversible Markov 
chains on G. 

Proof. For an infinite admissible path tt = (ttq,7Ti,tt2, ■ ■ . ) in G, define 
a(ir) := (a e (7r)) ee£ ; by a e (7r) := lim^oo /c e (7r , 7Ti, . . .,ir n )/n to be the limit- 
ing fraction of crossings of the edge e by the path tt. Let Z^ := ( Xq , X\ , X2 , . . . ). 
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Note that a^Z^) is defined Q^a-a-s. Define r n to be the nth return time 
to Vq. Since G is finite, t„ < 00 Q 1>0 , a -a-s. for all n G N and all a G (0,oo) E . 

Let / : A — > R be bounded and continuous. Denote the expectation with 
respect to Q« 0i a by E„ 0jCl . Since X Tn = vq, Theorem 2.9 implies that 

(40) ^ V0 , { a e+ k s {z Tn )) eeE [fHZoo))] =E V0!a [f(a(Z oo ))\Z Tn ] :=M n . 

Clearly, (M n ) n >o is a bounded martingale. Hence, by the martingale con- 
vergence theorem, 

n 1 ™ E ^,K+fc e (z rn )) eeB [/(a(^oo))] =E vlua [f(a(Z 00 ))\Z 00 ] = /(a(Z oc )) 

(41) 

= J fdS a(Zoo) 

Qv ,a-as.; here 5b denotes the point mass in b. Since A is compact, there is 
a countable dense subset of the set of bounded continuous functions on A. 
Hence, the above shows that, for Q„ 0]a -almost all Z^, 

( 42 ) Qv ih (a e +k e (z Tn )) eeE (a(Zoo) G •) => S a{Zoo) (-) weakly as n -> 00. 
The Q„ ^-distribution of a(Z oc ) equals P,j 0iO . Thus, 

(43) K ,(a e +k e (z Tn )) eeE (-) => ^ZocjCO weakly asn^oo 

for Q„ o a -almost all Z^. Recall that the Q„ 0i(l -distribution of aiZ^) (viz., P„ ,a) 
is absolutely continuous with respect to Lebesgue measure on A with the 
density 4> VOtCL which is strictly positive in the interior of A. Hence, for Lebesgue- 
almost all a G A, there is a sequence a n G A such that P^ ,a n => $a weakly. 
By the Krein-Milman theorem, convex combinations of point masses are 
weak-star dense in the set of all measures on A. Using a standard argument, 
it follows that the set of convex combinations of the distributions P„ , a is 
dense in the set of all probability measures on A. This completes the proof 
of the proposition. □ 



4.4. Computing some moments. For any edge eo G E, we can calculate 
the probability that the mixture of Markov chains with mixing measure 
<j>v ,a,do- traverses eo back and forth starting at an endpoint of eo- This gives 
a closed form for certain moments of the prior P 1)0>a . 



Proposition 4.6. For eo G E\E\ oop with endpoints v and v' , we have 

ifvo £ {v,v'}, 
ifv = v - 



Qeo(ae + 1) 

I A x v x v t a eo (a eo + L) 



a v (a v > + 1) 
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For a loop eo 6 E\ oop incident to v, we have 



(45) / ^ V W^W= al^ 



ifv = v . 



Proof. Case eo £ E\E\ oop . Suppose vo is an endpoint of eo, say, v = vq. 
Then 

(46) f ^^ ( j )v0ia (x)da{x)=Q VQ , a (Xo = v ,X 1 =v',X 2 = vo); 

this is the probability that the mixture of Markov chains traverses the edge 
eo back and forth starting at v . By (19) Q VQA = Pv ,a- Hence, (46) equals 
the probability that an edge-reinforced random walk traverses eo back and 
forth, namely, 

(47) P„ , a (X = v 0l X 1 = v',X 2 = v ) - aeo (aeo + 1} 



a v (a v > + 1) 

Here we used the fact that the sum of the weights of all edges incident to v 1 
equals a v > + 1 after eo has been traversed once. This proves the claim in the 
case ^o S eo- 

Suppose vo ^ eo- Define b := (6 e ) ee £ by b eo := a eo + 2 and b e := a e for 
e G E \ {eo}- Then, using the definition of 4>v 0) a^ we obtain 

(48) i^^ o a ( x ) = h^L^ x ) for a ll x G A. 

X V X V ' Zj V(j0i 

Using the definition of the normalizing constants Z VoA and Z VQ ^ and the 
identity T{z + 1) = zT(z), it follows that 

( 49) Z v°,b _ T((a v + l)/2)r((<v + l)/2)r(a eo + 2) _ a eo (a eo + 1) 



Z V0:a 4r((a„ + 3)/2)r((a„/ +3)/2)r(a eo ) (a„ + 1) (a v , + 1) " 

Since J A <f> VOt b(x) da{x) = 1, the claim follows by integrating both sides of 
(48) over A. 

Case eo £ -Eioop- The proof follows the same line as in the case eo ^ -Eioop- 
Let eo = {v} be incident to v. We prove only the case v ^ vq- Defining b as 
above, (48) is valid with 

_ r((o„ + i)/2)r((a eo + i)/2)r(aeo + 2) 

Z V0A 4r((a„ + 3)/2)r((a eo +3)/2)r(a eo ) 

(50) 

(a v + l)(a eo +1) a v + V 
here we used again the identity T(z + 1) = zT{z). The claim follows. □ 
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Recall the definitions (5) and (6) of k v (ir) and k e (ir) for a finite admissi- 
ble path -/r in G. Abbreviate k v := k v (ir), k e := k e (ir). For x = (x e ) e <=£ G A, 
denote by Q x (7r) the probability that the reversible Markov chain with tran- 
sition probabilities induced by the weights {x e ) e ^E on the edges traverses 
the path tt. Note that if tt is a closed path, that is, if the starting point and 
endpoint of tt agree, then Q x {tt) is independent of the starting point of tt. 
An argument as in the proof of Proposition 4.6 yields the following: 

Proposition 4.7. For any finite admissible path tt starting at vq, we 
have 



Qx{^)4>v ,a{x) da{x) 

(51) " A 

_ iUeeE\E loop rfeo^e + ^)]l^e£E loop Uto''^ + 2.)] 

nfrcT'K + 2i) IW\{*o} Uto\a v + l + 2i) 

For any finite admissible path tt with the same starting point and endpoint 
which avoids Vq, we have 

Qx(^)^v ,a( X ) da ( X ) 

(52) " A 

[UeeE\E loop ifc^e + i)][UeeE loop X^t\e + 2»)] 

U v evUto\^ + l + 2i) 
Here the empty product is defined to be 1. 

If 7r is a closed path, we call Q x (^) a cycle probability. The transition prob- 
abilities of a Markov chain with finite state space V that visits every state 
with probability 1 are completely determined by all its cycle probabilities 
(see, e.g., [7], Corollary on page 116). 

4.5. Simulating from the posterior. In this subsection we show how the 
posterior distribution of the unknown stationary distribution for the under- 
lying Markov chain can be simulated using reinforced random walks. 

Suppose our posterior distribution is Pu , a = 4>v ,a da. Let X® := (X^ ) n >o, 
i > 1, be independent reinforced random walks with the same initial edge 
weights a = (a e ) eg £. Let Zn := (Xq \ x[*\ . . . ,X$) and recall that k e (Zn' > ) 
equals the number of traversals of edge e by the process up to time n. 



Proposition 4.8. For any interval ICR and all e G E, we have 

--F V0:a (x e € I) a.s. 



(53) lim lim — 

n->oom-too 777, 



• ^ k e {Zn^) 

i <m: G 1 



n 
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Table 1 

Degrees of freedom for independent, reversible and full Markov specification 



\v\ 




3 


4 


5 


10 


20 


50 


100 


1000 


Independent |V| — 1 




2 


3 


4 


9 


19 


49 


99 


999 


Reversible |V|(|V| - 


l)/2-l 


2 


5 


9 


44 


189 


1224 


4949 


499499 


Full Markov \V\(\V\ 


-1) 


6 


12 


20 


90 


380 


2450 


9900 


999000 



Proof. For every n, the random variables k e (Zn )/n, i> 1, are i.i.d. 
Hence, by the Glivenko-Cantelli theorem, a.s. for all i£t, 



(54) 



lim — 



i <m: < x 



n 



P, 



vo,a 



n 
n 



< X 



<x). 



For the last equality we used (19). Since Q V0! a is a mixture of Markov chains, 
k e (Z n )/n converges to the normalized weight of the edge e Q„ , a -a.s. and, 
hence, weakly. Since the limiting distribution is continuous, 



(55) 



lim 



£i'o,a 



k e (Z n ) 



< X 



11 



vo 



and the claim follows. □ 



Proposition 4.9. For all e G E, 
(56) lim \ k -^A d p = [ Xe dF VOta . 

Proof. By (19), P vo , 

a — Quo, a- Since the proportion k n (e)/n converges 
Q«o,a~ a - s - to the normalized weight of the edge e, the claim follows from the 
dominated convergence theorem. □ 

Remark 4.10. The Markov chain with distribution induced by the edge 
weights (x e ) ee E £ A has the stationary distribution v{v) = ^f = \ Ylie<=,E v x e- 
Thus, Propositions 4.8 and 4.9 allow simulation of the P„ ^-distribution and 
the mean of v(v). 

5. Applications. Reversibility can serve as a natural intermediate be- 
tween independence and fully nonparametric Markovian dependence. On 



BAYES FOR REVERSIBLE MARKOV CHAINS 



19 



Table 2 

The humane HLA-B gene. Part of the DNA sequence of length 3370 



1 


tggtgtagga 


gaagagggat 


caggacgaag 


tcccaggtcc 


eggaegggge 


tctcagggtc 


61 


tcaggctccg 


agggccgcgt 


ctgcaatggg 


gaggcgcagc 


gttggggatt 


ccccactccc 


121 


ctgagtttca 


cttcttctcc 


caacttgtgt 


cgggtccttc 


ttccaggata 


ctcgtgacgc 


181 


gtccccactt 


cccactccca 


ttgggtattg 


gat atctaga 


gaagecaate 


agcgtcgccg 


241 


cggtcccagt 


tctaaagtcc 


ccacgcaccc 


acccggactc 


agagtctcct 


cagacgccga 


301 


gatgctggtc 


atggcgcccc 


gaaccgtcct 


cctgctgctc 


tcggcggccc 


tggccctgac 


361 


cgagacctgg 


gccggtgagt 


gcgggtcggg 


agggaaatgg 


cctctgccgg 


gaggagegag 


421 


gggaccgcag 


gcgggggcgc 


aggacctgag 


gagccgcgcc 


gggaggaggg 


tegggegggt 


481 


ctcagcccct 


cctcaccccc 


aggctcccac 


tccatgaggt 


atttctacac 


ctccgtgtcc 


541 


cggcccggcc 


gcggggagcc 


ccgcttcatc 


tcagtgggct 


acgtggacga 


cacccagttc 


601 


gtgaggttcg 


acagcgacgc 


cgcgagtccg 


agagaggagc 


cgcgggcgcc 


gtggatagag 


661 


caggaggggc 


cggagtattg 


ggaccggaac 


acac agate t 


acaaggccca 


ggcacagact 


721 


gaccgagaga 


gcctgcggaa 


cctgcgcggc 


tactacaacc 


agagegagge 


cggtgagtga 


781 


ccccggcccg 


gggcgcaggt 


cacgactccc 


catcccccac 


gtacggcccg 


ggtcgccccg 


841 


agtctccggg 


tccgagatcc 


gcctccctga 


ggccgcggga 


cccgcccaga 


ccctcgaccg 


901 


gcgagagccc 


caggcgcgtt 


tacccggttt 


cattttcagt 


tgaggccaaa 


atccccgcgg 


961 


gttggtcggg 


gcggggcggg 


gctcggggga 


ctgggctgac 


cgcggggccg 


gggccagggt 


1021 


ctcacaccct 


ccagagcatg 


tacggctgcg 


acgtggggcc 


ggaegggege 


ctcctccgcg 


1081 


ggcatgacca 


gtacgcctac 


gacggcaagg 


attacatege 


cctgaacgag 


gacctgcgct 


1141 


cctggaccgc 


cgcggacacg 


gcggctcaga 


tcacccagcg 


caagtgggag 


gcggcccgtg 


1201 


aggcggagca 


gcggagagcc 


tacctggagg 


gcgagtgcgt 


ggagtggctc 


cgcagatacc 


1261 


tggagaacgg 


gaaggacaag 


ctggagcgcg 


ctggtaccag 


gggcagtggg 


gagccttccc 


1321 


catctcctat 


aggtcgccgg 


ggatggcctc 


ccacgagaag 


aggaggaaaa 


tgggatcagc 


1381 


gctagaatgt 


cgccctccgt 


tgaatggaga 


atggcatgag 


ttttcctgag 


tttcctctga 


1441 


gggccccctc 


ttctctctag 


acaattaagg 


aatgaegtet 


ctgaggaaat 


ggaggggaag 


1501 


acagtcccta 


gaatactgat 


caggggtccc 


ctttgacccc 


tgcagcagcc 


ttgggaaccg 


1561 


tgacttttcc 


tctcaggcct 


tgttctctgc 


ctcacactca 


gtgtgtttgg 


ggctctgatt 


1621 


ccagcacttc 


tgagtcactt 


tacctccact 


cagatcagga 


gcagaagtcc 


ctgttccccg 


1681 


ctcagagact 


cgaactttcc 


aatgaatagg 


agattatccc 


aggtgcctgc 


gtccaggctg 


1741 


gtgtctgggt 


tctgtgcccc 


ttccccaccc 


caggtgtcct 


gtccattctc 


aggctggtca 


1801 


catgggtggt 


cctagggtgt 


cccatgaaag 


atgcaaagcg 


cctgaatttt 


ctgactcttc 


1861 


ccatcagacc 


ccccaaagac 


acacgtgacc 


caccacccca 


tctctgacca 


tgaggccacc 


1921 


ctgaggtgct 


gggccctggg 


tttctaccct 


geggagatea 


cactgacctg 


geagegggat 


1981 


ggcgaggacc 


aaactcagga 


cactgagctt 


gtggagacca 


gaccagcagg 


agatagaacc 


2041 


ttccagaagt 


gggcagctgt 


ggtggtgcct 


tctggagaag 


agcagagata 


cacatgccat 


2101 


etacaecatff 

o o o 


aeffffffctexc 

DODO O 


eaaexccct c 


accctecaffat 


es'eg'taaEea 

&&&& ^""-oo" 


ffffffffffa teas' 


2161 


gggtcatatc 


tcttctcagg 


gaaagcagga 


gcccttcagc 


agggtcaggg 


cccctcatct 


2221 


tcccctcctt 


tcccagagcc 


gtcttcccag 


tccaccgtcc 


ccatcgtggg 


cattgttgct 


2281 


ggcctggctg 


tcctagcagt 


tgtggtcatc 


ggagctgtgg 


tegctgetgt 


gatgtgtagg 


2341 


aggaagagtt 


caggtaggga 


aggggtgagg 


ggtggggtct 


gggttttctt 


gtcccactgg 


2401 


gggtttcaag 


ccccaggtag 


aagtgttccc 


tgectcatta 


ctgggaagca 


gcatgcacac 


2461 


aggggctaac 


gcagcctggg 


accctgtgtg 


ccagcactta 


ctcttttgtg 


cagcacatgt 


2521 


gacaatgaag 


gatggatgta 


tcaccttgat 


ggttgtggtg 


ttggggtcct 


gattccagca 


2581 


ttcatgagtc 


aggggaaggt 


ccctgctaag 


gacagacctt 


aggagggcag 


ttggtccagg 


2641 


acccacactt 


gctttcctcg 


tgtttcctga 


tcctgccctg 


ggtctgtagt 


catacttctg 


2701 


gaaattcctt 


ttgggtccaa 


gactaggagg 


ttcctctaag 


atctcatggc 


cctgcttcct 


2761 


cccagtgccc 


tcacaggaca 


ttttcttccc 


acaggtggaa 


aaggagggag 


ctactctcag 


2821 


gctgcgtgta 


agtggtgggg 


gtgggagtgt 


ggaggagctc 


acccacccca 


taattcctcc 


2881 


tgtcccacgt 


ctcctgcggg 


ctctgaccag 


gtcctgtttt 


tgttctactc 


caggcagega 


2941 


cagtgcccag 


ggctctgatg 


tgtctctcac 


agcttgaaaa 


ggtgagattc 


ttggggtcta 


3001 


gagtgggtgg 


ggtggcgggt 


ctgggggtgg 


gtggggcaga 


ggggaaaggc 


ctgggtaatg 


3061 


gggattcttt 


gattgggatg 


tttcgcgtgt 


gtggtgggct 


gtttagagtg 


teategctta 


3121 


ccatgactaa 


ccagaatttg 


ttcatgactg 


ttgttttctg 


tagectgaga 


cagctgtctt 


3181 


gtgagggact 


gagatgcagg 


atttcttcac 


gcctcccctt 


tgtgacttca 


agagectctg 


3241 


gcatctcttt 


ctgcaaaggc 


acctgaatgt 


gtctgcgtcc 


ctgttagcat 


aatgtgagga 


3301 


ggtggagaga 


cagcccaccc 


ttgtgtccac 


tgtgacccct 


gttcgcatgc 


tgacctgtgt 


3361 


ttcctcccca 













\V\ states, with no restrictions the number of free parameters is \V\ — 1 with 
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Table 3 

Occurrences iVy of the string ij for i,j £ {a,c,g,t} 



a c g t 



a 91 160 261 108 

c 213 351 161 249 

g 251 224 388 201 

t 66 239 254 152 



independence, |V|(|V| — 1) for full Markov and - — ^ ~ ' — 1 for reversibility. 
As Table 1 indicates, these numbers vary widely for \V\ large. 

In this section we illustrate the use of our priors for testing a variety 
of simple hypotheses. Table 2 shows a genetic data set from the DNA 
sequence of the humane HLA-B gene. This gene plays a central role in 
the immune system. The data displayed in Table 2 is downloaded from the 
webpage of the National Center for Biotechnology Information 
(www . ncbi . nlm . nih . gov/genome/guide/human/) . 

In Example A, we test i.i.d. ^ versus i.i.d. for the DNA-data. In Exam- 
ple B, we test i.i.d. versus reversible. In Example C, we test reversible versus 
full Markov. In Example D, we compare i.i.d. with full Markov. 

Let n a , n c , n g and nt denote the number of occurrences of a, c, g and t, 
respectively, in the data displayed in Table 2. Then 

(57) n a = 621, n c = 974, n g = 1064, m = 7U. 



Example A. A Bayes test of Hq: i.i.d.(|) versus H\: i.i.d. (unknown). 
A "standard" test can be based on the Bayes factor 

P(data\H ) 
P{data\H 1 )' 

See [9] for an extensive discussion. For Hi, we use a Dirichlet(l, 1, 1, 1) prior. 
This yields 

/ 1 \ 3370 

P(data\H ) = - « 1.142429015368253 • 10" 2029 , 



P(data\H 



.4, 

T(4)T(n a + l)T(n c + l)T(n g + l)T(n t + 1) 
T(n a + n c + n g + n t + 4) 

r(4)T(622)r(975)r(1065)r(712) 
r(3374) 

1.140417804695619 • 10" 1999 , 
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P(data\H ) » 
^ ~ 1.00176 • 10" 30 . 

Thus, i^o is strongly rejected. This is not surprising since the observed num- 
bers of a, c, g, t are n a = 621, n c = 974, n g = 1064, nt = 711, respectively. 

Example B. A Bayes test of Hq: i.i.d. (unknown) versus H\: reversible. 

Here we use a Dirichlet(l, 1, 1, 1) prior for the null hypothesis and the 
prior based on the complete graph with loops (see Figure 5) and all edge 
weights equal to 1. The probability P(data\Ho) is calculated in Example A. 
In order to calculate P(data\Hi), we first determine the transition counts 
k e for our data (see Table 4) and also k v = n v — 5 a (v): 

(58) A; a = 620, fc c = 974, k g = 1064, k t = 711. 

We abbreviate E' = {{a,c},{a,g},{a,t},{c,g},{c,t},{g,t}}. By the first 
part of Proposition 4.7, 

m . t n ee E'nfeo 1 (i+on jeW} n^ /2 " 1 (i+^) 

P(data\Hi) = - 



nfio 1 (4+2i)n ieKc , 9 }n£o 1 (5+20 



= (373)!(512)!(174)!(385)!(488)!(455)! 

90 350 387 151 

(59) x [](l + 2i)[](l + 2i)n(l + 2i)n( 1 + 2 



X 



i=0 i=0 i=Q i=0 

/ 710 619 973 1063 \ 

n(4 + 2t) n( 5 + 2i ) n( 5 + 2i ) n ( 5 + 2 *) 

,i=0 i=Q j=0 i=0 / 



« 2.166939224648291 • 10" 1961 . 
So the Bayes factor is 

§^1.5.2628.10^ 
P(data\Hi) 

and the null hypothesis is strongly rejected. 



Table 4 
! transitior 
i,j £ {a,c,5,t} 



27ie undirected transition counts kujx, 





a 


c 


9 


t 


(1 


182 


373 


512 


174 


c 


373 


702 


385 


488 


9 


512 


385 


776 


455 


t 


174 


488 


455 


304 
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Example C. A Bayes test of Ho: reversible versus H\: full Markov. 

Here we use our conjugate prior on reversible chains with all constants 
chosen as one. We use product Dirichlet measure for the rows in the full 
Markov case. This yields 

Pidatam = J] r(4)0^™*> r(JV « + 1} 



T(ki +4) 

i£{a,c,g,t} v 1 ' 

4 r(92)r(i6i)r(262)r(i09) 



= r(4) 

v ; T(624) 

r(214)r(352)r(162)r(250) 
x r(978) 

r(252)r(225)r(389)r(202) r(67)r(240)r(255)r(153) 

x r(io68) r(7i5) 

« 4.16382063735625 • 10" 1956 . 
The probability P(data\Ho) was calculated in Example B. Hence, 

^"1^.20421.10-°. 
P(data\H{) 

We see that a straightforward Bayes test rejects reversibility. 

Example D. A Bayes test of Hq: i.i.d. (unknown) versus H\: full Markov. 
Using the Bayes factors computed above, the null hypothesis is strongly 
rejected: 

2.73887 -10- 44 . 

P{data\Hi) 

Of course, an i.i.d. process is a reversible Markov chain. 

In using the Dirichlet prior for testing uniformity with multinomial data 
and for testing independence in contingency tables, I. J. Good found the 
symmetric Dirichlet prior with density proportional to nf=i x i _1 an impor- 
tant tool. Good's many insights into these testing problems may be accessed 
through his book [9] and the survey article [10]. 

We have used the analog of the symmetric Dirichlet for the reversible 
Markov chain context with all edge weights a e equal to a constant c say. 
As c tends to infinity, this prior tends to a point mass supported on the 
simple random walk on the graph. As c tends to zero, this prior tends to an 
improper prior which gives the maximum likelihood as its posterior. 

Good also worked with c-mixtures of symmetric Dirichlet priors. We sus- 
pect that parallel, useful things can be done in our case as well. 
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We have not found any literature about statistical analysis of reversible 
Markov chains with unknown transitions and append two data analytic re- 
marks here. First, under reversibility, the count N vv > of v to v' transitions 
has the same expectation as the count N v / V of v' to v transitions, namely, 
u(v)k(v,v'). This suggests looking at ratios N vv > /N v / V or differences N vv > — 
N v , v . For example, from Table 3, N ac /N ca = 160/213, N ag /N ga = 261/251, 
N a t/Nta = 108/66, N cg /N gc = 161/224, N ct /N tc = 249/239, N gt /N tg = 
201/254; most of these are way off. 

In large samples, these counts have limiting normal distributions by re- 
sults of Hoglund [11]. A second data analytic tool would be to estimate 
the stationary distribution [perhaps by the method of moments estimator 
u(v) = -\{i < n : Xi = v}\] and also estimate the transition matrix, and then 
compare v(v)k(v,v') with v{v')k{v' ,v). 

An interesting problem not tackled here is finding natural priors on the 
set of reversible Markov chains with a fixed stationary distribution. For def- 
initeness, consider the uniform stationary distribution. Then the problem 
is to put a prior on S(n), the symmetric doubly stochastic n x n matrices. 
We make two remarks. First, determining the Euclidean volume of S(n) is 
a long-standing open problem; see [1] for recent results. Second, S(n) is a 

2 

compact, convex subset of W 1 . Its extreme points are well known to be 
the symmetrized permutation matrices (see [17]). Thus, if tt is a permuta- 
tion matrix on n letters with e(-7r) the usual n x n permutation matrix, let 
e(-7r) = ^[e(-7r) + e(-7r~ 1 )]. The extreme points of S(n) are (e(ir)) as it ranges 
over permutations in S n . We may put a prior on S{n) by taking a random 
convex combination of the e(-7r). Alas, S(n) is not a simplex, so symmetric 
weights on the extreme points may not lead to symmetric measures on S(n). 

Acknowledgment. We would like to thank Franz Merkl for some inter- 
esting discussions. 
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